inactive feature
A Safe Screening Rule for Sparse Logistic Regression
Jie Wang, Jiayu Zhou, Jun Liu, Peter Wonka, Jieping Ye
Although many recent efforts have been devoted to its efficient implementation, its application to high dimensional data still poses significant challenges. In this paper, we present a fast and effective sparse lo gistic regression s creening rule (Slores) to identify the "0" components in the solution vector, which may lead to a substantial reduction in the number of features to be entered to the optimization. An appealing feature of Slores is that the data set needs to be scanned only once to run the screening and its computational cost is negligible compared to that of solving the sparse logistic regression problem. Moreover, Slores is independent of solvers for sparse logistic regression, thus Slores can be integrated with any existing solver to improve the efficiency. We have evaluated Slores using high-dimensional data sets from different applications. Experiments demonstrate that Slores outperforms the existing state-of-the-art screening rules and the efficiency of solving sparse logistic regression can be improved by one magnitude.
- North America > United States > Arizona > Maricopa County > Tempe (0.05)
- North America > United States > North Carolina > Wake County > Cary (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- (9 more...)
Sparse minimum Redundancy Maximum Relevance for feature selection
Naylor, Peter, Poignard, Benjamin, Climente-González, Héctor, Yamada, Makoto
We propose a feature screening method that integrates both feature-feature and feature-target relationships. Inactive features are identified via a penalized minimum Redundancy Maximum Relevance (mRMR) procedure, which is the continuous version of the classic mRMR penalized by a non-convex regularizer, and where the parameters estimated as zero coefficients represent the set of inactive features. We establish the conditions under which zero coefficients are correctly identified to guarantee accurate recovery of inactive features. We introduce a multi-stage procedure based on the knockoff filter enabling the penalized mRMR to discard inactive features while controlling the false discovery rate (FDR). Our method performs comparably to HSIC-LASSO but is more conservative in the number of selected features. It only requires setting an FDR threshold, rather than specifying the number of features to retain. The effectiveness of the method is illustrated through simulations and real-world datasets. The code to reproduce this work is available on the following GitHub: https://github.com/PeterJackNaylor/SmRMR.
- Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
- North America > United States > New York (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Two-Layer Feature Reduction for Sparse-Group Lasso via Decomposition of Convex Sets
However, in large-scale applications, the complexity of the regularizers entails great computational challenges. In this paper, we propose a novel two-layer feature reduction method (TLFre) for SGL via a decomposition of its dual feasible set. The two-layer reduction is able to quickly identify the inactive groups and the inactive features, respectively, which are guaranteed to be absent from the sparse representation and can be removed from the optimization. Existing feature reduction methods are only applicable for sparse models with one sparsity-inducing regularizer. To our best knowledge, TLFre is the first one that is capable of dealing with multiple sparsity-inducing regularizers. Moreover, TLFre has a very low computational cost and can be integrated with any existing solvers. Experiments on both synthetic and real data sets show that TLFre improves the efficiency of SGL by orders of magnitude.
- North America > United States > Arizona > Maricopa County > Tempe (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
A Safe Screening Rule for Sparse Logistic Regression
Jie Wang, Jiayu Zhou, Jun Liu, Peter Wonka, Jieping Ye
Although many recent efforts have been devoted to its efficient implementation, its application to high dimensional data still poses significant challenges. In this paper, we present a fast and effective sparse logistic regression screening rule (Slores) to identify the "0" components in the solution vector, which may lead to a substantial reduction in the number of features to be entered to the optimization. An appealing feature of Slores is that the data set needs to be scanned only once to run the screening and its computational cost is negligible compared to that of solving the sparse logistic regression problem. Moreover, Slores is independent of solvers for sparse logistic regression, thus Slores can be integrated with any existing solver to improve the efficiency. We have evaluated Slores using high-dimensional data sets from different applications. Experiments demonstrate that Slores outperforms the existing state-of-the-art screening rules and the efficiency of solving sparse logistic regression can be improved by one magnitude.
- North America > United States > Arizona > Maricopa County > Tempe (0.05)
- North America > United States > North Carolina > Wake County > Cary (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Lasso Screening Rules via Dual Polytope Projection
Lasso is a widely used regression technique to find sparse representations. When the dimension of the feature space and the number of samples are extremely large, solving the Lasso problem remains challenging. To improve the efficiency of solving large-scale Lasso problems, El Ghaoui and his colleagues have proposed the SAFE rules which are able to quickly identify the inactive predictors, i.e., predictors that have 0 components in the solution vector. Then, the inactive predictors or features can be removed from the optimization problem to reduce its scale. By transforming the standard Lasso to its dual form, it can be shown that the inactive predictors include the set of inactive constraints on the optimal dual solution.
Two-Layer Feature Reduction for Sparse-Group Lasso via Decomposition of Convex Sets
However, in large-scale applications, the complexity of the regularizers entails great computational challenges. In this paper, we propose a novel two-layer feature reduction method (TLFre) for SGL via a decomposition of its dual feasible set. The two-layer reduction is able to quickly identify the inactive groups and the inactive features, respectively, which are guaranteed to be absent from the sparse representation and can be removed from the optimization. Existing feature reduction methods are only applicable for sparse models with one sparsity-inducing regularizer. To our best knowledge, TLFre is the first one that is capable of dealing with multiple sparsity-inducing regularizers. Moreover, TLFre has a very low computational cost and can be integrated with any existing solvers. Experiments on both synthetic and real data sets show that TLFre improves the efficiency of SGL by orders of magnitude.
- North America > United States > Arizona > Maricopa County > Tempe (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
A Safe Screening Rule for Sparse Logistic Regression
Although many recent efforts have been devoted to its efficient implementation, its application to high dimensional data still poses significant challenges. In this paper, we present a fast and effective sparse logistic regression screening rule (Slores) to identify the "0" components in the solution vector, which may lead to a substantial reduction in the number of features to be entered to the optimization. An appealing feature of Slores is that the data set needs to be scanned only once to run the screening and its computational cost is negligible compared to that of solving the sparse logistic regression problem. Moreover, Slores is independent of solvers for sparse logistic regression, thus Slores can be integrated with any existing solver to improve the efficiency. We have evaluated Slores using high-dimensional data sets from different applications. Experiments demonstrate that Slores outperforms the existing state-of-the-art screening rules and the efficiency of solving sparse logistic regression can be improved by one magnitude.
- North America > United States > Arizona > Maricopa County > Tempe (0.05)
- North America > United States > North Carolina > Wake County > Cary (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)